AITopics | intrinsic reward function

Intrinsic Reward Functions

Neural Information Processing SystemsApr-25-2026, 01:35:00 GMT

In our approach, the intrinsic reward can be separated into two parts. One is related to action-aware diversity, while the other is related to observation-aware diversity. We revisit the formulation of our information-theoretic objective (Eq. A.1 Intrinsic Rewards for Action-Aware Diversity First we analyze term 2, which is related to action-aware diversity. T 1 T 1 X p(at| t,id) Xp(at| t,id) 2 = Eid, log q(at| t) DKL (p(at| t)kq(at| t)) Eid, log q(at| t) .

artificial intelligence, diversity, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

Maximum-Entropy Exploration with Future State-Action Visitation Measures

Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien

arXiv.org Machine LearningMar-20-2026

Maximum entropy reinforcement learning motivates agents to explore states and actions to maximize the entropy of some distribution, typically by providing additional intrinsic rewards proportional to that entropy function. In this paper, we study intrinsic rewards proportional to the entropy of the discounted distribution of state-action features visited during future time steps. This approach is motivated by two results. First, we show that the expected sum of these intrinsic rewards is a lower bound on the entropy of the discounted distribution of state-action features visited in trajectories starting from the initial states, which we relate to an alternative maximum entropy objective. Second, we show that the distribution used in the intrinsic reward definition is the fixed point of a contraction operator and can therefore be estimated off-policy. Experiments highlight that the new objective leads to improved visitation of features within individual trajectories, in exchange for slightly reduced visitation of features in expectation over different trajectories, as suggested by the lower bound. It also leads to improved convergence speed for learning exploration-only agents. Control performance remains similar across most methods on the considered benchmarks.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2603.18965

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing SystemsMar-16-2026, 20:54:51 GMT

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Zeyu Zheng, Junhyuk Oh, Satinder Singh

Neural Information Processing SystemsFeb-12-2026, 20:13:39 GMT

Whether itispossible tolearn intrinsic reward functions for learning agents remains an open problem.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Yali Du, Lei Han, Meng Fang, Ji Liu, Tianhong Dai, Dacheng Tao

Neural Information Processing SystemsFeb-11-2026, 09:36:50 GMT

Without a centralized controller, each agent is responsible to collaborate with others on its own decision.

agent, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
Oceania > Australia (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems

Neural Information Processing SystemsFeb-10-2026, 09:00:18 GMT

Conversational Recommender Systems (CRS) actively elicit user preferences to generate adaptive recommendations. Mainstream reinforcement learning-based CRS solutions heavily rely on handcrafted reward functions, which may not be aligned with user intent in CRS tasks.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.05)
North America > United States > California > Santa Clara County > Los Gatos (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 23:43:14 GMT

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.

agent, learning individual intrinsic reward, liir, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing SystemsNov-20-2025, 22:12:39 GMT

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

intrinsic reward function, learning intrinsic reward, reward function, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

On Learning Intrinsic Rewards for Policy Gradient Methods

Zeyu Zheng, Junhyuk Oh, Satinder Singh

Neural Information Processing SystemsNov-20-2025, 16:27:07 GMT

In this paper we build on the Optimal Rewards Framework of Singh et al. [2010] that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
(2 more...)

Add feedback

Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems

Neural Information Processing SystemsOct-8-2025, 11:34:50 GMT

Conversational Recommender Systems (CRS) actively elicit user preferences to generate adaptive recommendations. Mainstream reinforcement learning-based CRS solutions heavily rely on handcrafted reward functions, which may not be aligned with user intent in CRS tasks.

intrinsic reward, recommendation, reward function, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.05)
North America > United States > California > Santa Clara County > Los Gatos (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

intrinsic reward function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Intrinsic Reward Functions

Maximum-Entropy Exploration with Future State-Action Visitation Measures

On Learning Intrinsic Rewards for Policy Gradient Methods

On Learning Intrinsic Rewards for Policy Gradient Methods

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

On Learning Intrinsic Rewards for Policy Gradient Methods

On Learning Intrinsic Rewards for Policy Gradient Methods

Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems